---
title: "Understanding Peanuts and Schulzian Symmetry: Panel Detection, Caption Detection, and Gag Panels in 17,897 Comic Strips Through Distant Viewing."
output:
  html_document:
    theme: cosmo
    highlight: zenburn
    css: "../css/note-style.css"
editor_options: 
  markdown: 
    wrap: 72
---

## Dataset Description

All necessary files are contained in the larger folder, with three main
subfolders: 
-   data
    - This contains each of our primary datasheets.
-   functions
    - This contains one "workhorse" R notebook to be called infrequently.
-   notebooks
    - This contains the current R markdown notebook.
    


# Getting Started

Before running this notebook, select "Session \> Restart R and Clear
Output" in the menu above to start a new R session. This will clear any
old data sets and give us a blank slate to start with.

After starting a new session, run the following code chunk to load the
libraries and data that we will be working with in this tutorial.

```{r, include=FALSE, message=FALSE}
source("../functions/UR_comics_functions.R")
```

# Case Study: Peanuts by Charles Schulz

Today we are going to look at a dataset describing the long-running
comic strip "Peanuts" by Charles Schulz. "Peanuts" contains 17,897
unique comic strips, each written and drawn by Schulz.

To begin, run the following script to load in our primary dataset table.
It contains one row for each Peanuts comic strip from 1950 to 2000. The
last two columns describe the size of the scanned image of the comic in
pixels.

```{r}
peanuts_comics <- read_csv("../data/UR_Peanuts_metadata.bz2")
peanuts_comics
```

We have another dataset that was automatically generated using a
computer vision algorithm. This one describes all of the panels that
were detected in each comic strip. It provides values for each vertex of
the panel in relation to the entire image.

```{r}
peanuts_panels <- read_csv("../data/UR_Peanuts_panels.bz2")
peanuts_panels
```

We have a third dataset of comic captions. These describe all of the locations 
in the comic where our computer vision algorithm detected text.

```{r}
peanuts_captions <- read_csv("../data/UR_Peanuts_captions.bz2")
peanuts_captions
```
Lastly, we have a unique datasheet that contains data for combined panel and 
caption positioning data.  

```{r}
peanuts_prop <- read_csv("../data/UR_Peanuts_props.csv")
peanuts_prop
```

## Code for figures

Each of the following code chunks creates a figure in the paper. 

[Figure 05: Distant viewing visualization of Peanuts comics published in 1975. Note the consistency of the daily panels, the fluctuations in Sunday panel 
length, and the outlying five strips with more than four panels.]
```{r}
peanuts_comics %>% 
  inner_join(panels_total, by = "image_path") %>% 
  mutate(wday = wday(date, label = TRUE), year = year(date)) %>% 
  mutate(wday = if_else(wday == "Sun", "Sun", "Dailies")) %>% 
  filter(year(date) == "1975") %>% 
  ggplot() +
  geom_point(aes(date, n, color = wday)) +  
  scale_x_date(
    date_breaks = "month",
      date_labels = "%b",
      date_minor_breaks = "month"
    ) + 
  labs(x = "Peanuts comics published in 1975", y = "number of panels")
```

[Figure 07: Distant viewing visualization of the average panel length of Peanuts strips by year, divided between daily strips and Sunday strips. Note the change 
in publishing length for daily strips beginning in 1988. The year 2000 is an outlier of incompletion due to Schulz’s death on February 12, 2000 and the 
strip’s end on February 13, 2000.] 

```{r, Figure 7}
peanuts_comics %>% 
  inner_join(panels_total, by = "image_path") %>% 
    mutate(wday = wday(date, label = TRUE), year = year(date)) %>% 
    mutate(wday = if_else(wday == "Sun", "Sun", "Dailies")) %>% 
  group_by(year, wday) %>% 
  summarize(n = mean(n)) %>% 
  ggplot(aes(year, n)) +
  geom_point(aes(color = wday)) + 
  labs(y = "number of panels")
```


[Figure 08: Distant viewing visualization of the panel usage for each day in 
1988. The steep change in variance begins on February 29th, 1988 as Schulz 
shifts further and further away from the four-panel format until his death in 
February 2000.]
```{r, Figure 08}
peanuts_comics %>% 
  inner_join(panels_total, by = "image_path") %>% 
  mutate(wday = wday(date, label = TRUE), year = year(date)) %>% 
  mutate(wday = if_else(wday == "Sun", "Sun", "Dailies")) %>% 
  filter(year(date) == "1988") %>% 
  ggplot() +
  geom_point(aes(date, n, color = wday)) +  
  scale_x_date(
    date_breaks = "month",
      date_labels = "%b",
      date_minor_breaks = "month"
    ) + 
  labs(x = "Peanuts comics published in 1988", y = "number of panels")

```

[Figure 12: Distribution percentage of textual captions in Peanuts by number of 
total panels for those ranging between one-panel strips and twelve-panel strips,
which were the most common formats in Peanuts. Of significance is the high rate 
of caption presence in the first and last panels of any Peanuts strip, 
regardless of panel number size. Likewise, note the above average percentage for
caption presence in any panel.]

```{r, Figure 12}
peanuts_prop %>%
  group_by(image_path) %>%
  mutate(num_panels = max(panel_id)) %>%
  group_by(num_panels, panel_id) %>%
  summarize(avg_missing = 100 * mean(cap_num != 0), n = n()) %>%
  ungroup() %>%
  filter(num_panels <= 12) %>%
  mutate(color = if_else(panel_id == 1, "#fabd2f", "#ebdbb2")) %>%
  mutate(color = if_else(panel_id == num_panels, "#83a598", color)) %>%
  mutate(perc_text = sprintf("%d%%", round(avg_missing))) %>%
  mutate(perc_text = if_else(color == "#ebdbb2", "", perc_text)) %>%  
  ggplot(aes(factor(panel_id), avg_missing)) +
    geom_col(aes(fill = factor(color))) +
    geom_text(aes(color = factor(color), label = perc_text, y = avg_missing + 6), size = 2) +
    facet_wrap(~num_panels, nrow = 3) +
    scale_fill_identity() +
    scale_color_identity() +
    labs(x = "Panel Number", y = "Percentage of Panels With Text",
         title = "Distribution of Textual Captions in Peanuts",
         subtitle = "By number of total panels")
```
